studio: extend offline DNS auto-detect to inference parent + training by danielhanchen · Pull Request #5512 · unslothai/unsloth

danielhanchen · 2026-05-17T11:08:18Z

Summary

Follow-up to #5505. That PR fixed the GGUF/llama-server load path. Two adjacent Studio code paths still burn 30-60s of soft-failed network timeouts before the worker subprocess starts when DNS to huggingface.co is dead and the model is already in the local HF cache. This PR extends the same DNS auto-detect helper to both.

Inference parent process (FastAPI side, before worker spawn)

routes/inference.py:load_model now runs ModelConfig.from_identifier inside _hf_offline_if_dns_dead so the soft-failed network calls reached transitively from there short-circuit on dead DNS:

utils/models/model_config.py LoRA-detect hf_model_info(identifier, token=...) call (was ~25s timeout)
utils/models/model_config.py hf_hub_download(identifier, 'adapter_config.json', ...) for remote LoRAs (was ~25s timeout, now bails fast via LocalEntryNotFoundError)
utils/transformers_version.py _check_tokenizer_config_needs_v5 raw urllib.urlopen(...)/tokenizer_config.json (was ~10s timeout)
utils/transformers_version.py _check_config_needs_550 raw urllib.urlopen(...)/config.json (was ~10s timeout)

The inline env-var check used by list_gguf_variants and detect_gguf_model_remote (added in #5505) is extracted into a shared _env_offline() helper to avoid duplicating the truthy-value parsing across new call sites.

Training subprocess (core/training/worker.py)

run_training_process now mirrors the DNS auto-detect already in core/inference/worker.py. On dead DNS, it sets HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before importing torch/transformers/unsloth, so every from_pretrained, snapshot_download, and load_dataset call further down resolves from cache. Scope is per-subprocess (the orchestrator spawns a fresh worker per training run).

core/training/trainer.py:load_model skips the proactive hf_model_info gated-repo probe when _env_offline() is true. The API is unreachable, and a gated model that is already cached is exactly the scenario the user is trying to train against. from_pretrained surfaces the real error if access is actually denied.

Behaviour

Online: unchanged. Every network call still happens first.
Offline (DNS dead): the inference load and the training start fall through to cache in seconds instead of 30-60s.
User-set HF_HUB_OFFLINE=1 or TRANSFORMERS_OFFLINE=1: preserved end-to-end (the contextmanager already respects this from studio: load cached GGUF models when fully offline #5505).

Test plan

studio/backend/tests/test_offline_inference_parent.py: 7 new cases covering _env_offline() parsing, transformers_version urllib short-circuit, LoRA-detect API skip.
studio/backend/tests/test_offline_gguf_cache_fallback.py: 26 existing cases still pass after the env-check extraction.
Combined run: 33 passed in 3.74s.
CI green on 3.10/3.11/3.12/3.13 backend matrix.

Stacks on #5505. Recommend merging that first.

gemini-code-assist

Code Review

This pull request implements an offline fallback mechanism for Hugging Face Hub interactions by introducing DNS probing and local cache resolution for GGUF models. These changes prevent long network timeouts when huggingface.co is unreachable by automatically enabling offline modes and short-circuiting API calls. Feedback focuses on improving the implementation's thread safety by avoiding global socket timeout mutations, optimizing redundant path operations during cache scanning, and centralizing duplicated helper logic for DNS probing and environment checks.

gemini-code-assist · 2026-05-17T11:10:49Z

+    prev = socket.getdefaulttimeout()
+    socket.setdefaulttimeout(timeout)
+    try:
+        try:
+            socket.gethostbyname(host)
+            return False
+        except Exception:
+            return True
+    finally:
+        socket.setdefaulttimeout(prev)


Modifying socket.setdefaulttimeout is not thread-safe as it affects the entire process. In the FastAPI parent process, concurrent requests could interfere with each other's timeout settings. A safer, thread-safe alternative for a reachability check is to use socket.create_connection((host, 443), timeout=timeout), which respects the timeout without mutating global state and supports both IPv4 and IPv6 addresses.

References

Use socket.getaddrinfo() or high-level wrappers like socket.create_connection() to create sockets that support both IPv4 and IPv6 addresses, instead of hardcoding an address family.

gemini-code-assist · 2026-05-17T11:10:49Z

+                        matches = sorted(
+                            p.relative_to(snap).as_posix()
+                            for p in snap.rglob("*.gguf")
+                            if "mmproj" not in p.name.lower()
+                            and boundary.search(p.relative_to(snap).as_posix().lower())
+                        )


p.relative_to(snap).as_posix() is a relatively expensive path operation and is currently called twice for every file found by rglob. To improve efficiency, compute this value once and reuse the result.

matches = sorted( rel for p in snap.rglob("*.gguf") if "mmproj" not in p.name.lower() and (rel := p.relative_to(snap).as_posix()) and boundary.search(rel.lower()) )

References

When a condition or calculated value is used across multiple conditional branches, compute it once and reuse the result to ensure consistency and improve maintainability.

To improve efficiency, avoid redundant data iterations and transformations.

gemini-code-assist · 2026-05-17T11:10:49Z

+    if "HF_HUB_OFFLINE" not in os.environ:
+        import socket as _socket
+
+        prev_timeout = _socket.getdefaulttimeout()
+        _socket.setdefaulttimeout(2.0)
+        try:
+            _socket.gethostbyname("huggingface.co")
+        except Exception:
+            os.environ["HF_HUB_OFFLINE"] = "1"
+            os.environ.setdefault("TRANSFORMERS_OFFLINE", "1")
+            logger.warning(
+                "huggingface.co unreachable; HF_HUB_OFFLINE=1 set for this worker."
+            )
+        finally:
+            _socket.setdefaulttimeout(prev_timeout)


The DNS probe logic here is duplicated from _probe_dns_dead in core/inference/llama_cpp.py. You should centralize this logical check into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

References

Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

gemini-code-assist · 2026-05-17T11:10:49Z

+def _env_offline() -> bool:
+    """True if HF_HUB_OFFLINE or TRANSFORMERS_OFFLINE is set to a truthy value."""
+    return os.environ.get("HF_HUB_OFFLINE", "").lower() in (
+        "1",
+        "true",
+        "yes",
+    ) or os.environ.get("TRANSFORMERS_OFFLINE", "").lower() in ("1", "true", "yes")


The _env_offline helper is duplicated in both utils/models/model_config.py and utils/transformers_version.py. Since model_config.py already imports from transformers_version.py, you should centralize this logic in transformers_version.py and import it here to improve maintainability and avoid logic drift.

References

Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26053dd50b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T11:11:14Z

+        # so offline loads don't burn ~25s waiting for the HF API to time out.
+        # If the repo really is a LoRA, the worker still resolves it from
+        # cache later via the same env var.
+        if not is_lora and not is_local and not _env_offline():


Detect cached LoRA adapters while offline

When HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE is set for a cached remote LoRA repo, this skips the only path that flips is_lora to true; from_identifier() then falls through as if the adapter repo were a full base model, so the later LoRA block never downloads/reads cached adapter_config.json and the inference worker receives is_lora=False. This regresses offline loading of already-cached HF LoRA adapters; the offline path should still check the local HF cache for adapter_config.json (or otherwise honor cached adapter metadata) instead of bypassing detection entirely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcb30f1c94

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T11:36:15Z

+    socket.setdefaulttimeout(timeout)
+    try:
+        try:
+            socket.gethostbyname(host)


Bound DNS probes with a real timeout

When a user's resolver is blackholed or very slow, this new probe can block the FastAPI load path well beyond the intended 2 seconds because socket.setdefaulttimeout() does not bound socket.gethostbyname() name-resolution time. In that environment, every model load that enters _hf_offline_if_dns_dead() can hang before it reaches the cached/offline fallback, so use a resolver strategy with an actual deadline instead of relying on the socket default timeout.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-17T11:36:15Z

+    if "HF_HUB_OFFLINE" in os.environ:
+        yield False


Keep offline scopes isolated per request

When two load requests overlap while DNS is down, the second one can enter here while the first request's context has temporarily set HF_HUB_OFFLINE; it then yields without owning/restoring the setting, and the first request's finally later pops the variable while the second request is still running. Because environment variables are process-wide, that second request can unexpectedly fall back to normal HF network calls/timeouts mid-scope; guard this with a lock/refcount or avoid using process-wide env mutation for per-request state.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84e3782ace

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T12:15:16Z

+    return os.environ.get("HF_HUB_OFFLINE", "").lower() in (
+        "1",
+        "true",
+        "yes",
+    ) or os.environ.get("TRANSFORMERS_OFFLINE", "").lower() in ("1", "true", "yes")


Honor ON in offline environment flags

When users set HF_HUB_OFFLINE=ON or TRANSFORMERS_OFFLINE=ON (standard truthy values for Hugging Face/Transformers), this helper returns False, so the new offline short-circuits below are skipped and _check_tokenizer_config_needs_v5 / _check_config_needs_550 still perform raw urllib fetches that can wait up to 10 seconds each before failing. This undermines the offline path added here for a supported env spelling; include on in the truthy set, and keep the duplicate offline helper consistent.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec15082487

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T13:08:27Z

+    prev = socket.getdefaulttimeout()
+    socket.setdefaulttimeout(timeout)
+    try:


Avoid mutating process-wide socket defaults in DNS probe

_probe_dns_dead() changes the global socket default timeout via socket.setdefaulttimeout(timeout) before resolving huggingface.co. Because this helper is used on the FastAPI parent load path, concurrent requests in the same process can create sockets during this window and unexpectedly inherit the 2s default, causing unrelated outbound calls to fail or time out early. Please switch to a probe mechanism that enforces its own deadline without touching process-wide socket defaults.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 003761caf2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T00:14:38Z

+    t = threading.Thread(target = _probe, daemon = True)
+    t.start()
+    t.join(timeout)
+    # Thread still running -> resolver wedged -> treat as dead.
+    return True if result[0] is None else result[0]


Reuse a bounded DNS probe thread

When DNS resolution hangs (the _probe thread never returns), join(timeout) exits but the daemon thread keeps running, and each call to this helper spawns another stuck thread. Because _hf_offline_if_dns_dead() is now used on model-load paths in the long-lived inference parent process, repeated loads during a resolver outage can accumulate blocked threads and eventually hit thread/memory limits, degrading or breaking further requests.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T00:14:38Z

+    if "HF_HUB_OFFLINE" in os.environ:
+        yield False
+        return


Treat HF_HUB_OFFLINE as enabled only when truthy

This guard checks only for variable presence, so HF_HUB_OFFLINE=0 (a common explicit default) disables the DNS auto-detect path entirely. In that configuration, dead-DNS loads bypass the new offline short-circuit and fall back to slow network timeouts again, which defeats the regression fix this change is introducing.

Useful? React with 👍 / 👎.

Two follow-ups from the review pass on unslothai#5512: * ModelConfig.from_identifier no longer skips the remote LoRA-detect hf_model_info call when _env_offline() is true. huggingface_hub short-circuits the call via OfflineModeIsEnabled in ~0ms when HF_HUB_OFFLINE is set, so the original 25s concern was moot once routes/inference.py wrapped the call in _hf_offline_if_dns_dead. Skipping the API meant users with a cached LoRA adapter (adapter_config.json on disk) got is_lora=False and the load failed. After the API call (which raises fast offline) a new cache-fallback walks the HF cache snapshot for adapter_config.json via the existing _iter_hf_cache_snapshots helper. * test_hf_model_info_not_called_when_offline replaced. The old test raised AssertionError inside production code that catches Exception, so it passed even if the call happened. New tests use MagicMock and assert call_count >= 1, plus a fixture that stages a fake HF cache with adapter_config.json to verify the offline cache detection. Test count goes from 7 to 8 in test_offline_inference_parent.py. Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.

Two follow-ups from the review pass on #5512: * ModelConfig.from_identifier no longer skips the remote LoRA-detect hf_model_info call when _env_offline() is true. huggingface_hub short-circuits the call via OfflineModeIsEnabled in ~0ms when HF_HUB_OFFLINE is set, so the original 25s concern was moot once routes/inference.py wrapped the call in _hf_offline_if_dns_dead. Skipping the API meant users with a cached LoRA adapter (adapter_config.json on disk) got is_lora=False and the load failed. After the API call (which raises fast offline) a new cache-fallback walks the HF cache snapshot for adapter_config.json via the existing _iter_hf_cache_snapshots helper. * test_hf_model_info_not_called_when_offline replaced. The old test raised AssertionError inside production code that catches Exception, so it passed even if the call happened. New tests use MagicMock and assert call_count >= 1, plus a fixture that stages a fake HF cache with adapter_config.json to verify the offline cache detection. Test count goes from 7 to 8 in test_offline_inference_parent.py. Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7f94471ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T04:37:14Z

+    # from_pretrained / snapshot_download / load_dataset calls resolve
+    # from cache instead of burning ~25s per call on retries. Scoped to
+    # this subprocess only (orchestrator spawns a fresh worker per run).
+    if "HF_HUB_OFFLINE" not in os.environ:


Run DNS probe when HF_HUB_OFFLINE is falsy

This guard treats any presence of HF_HUB_OFFLINE as an opt-out, so common explicit defaults like HF_HUB_OFFLINE=0 skip the auto-detect path entirely. In that configuration, a dead resolver no longer flips the worker to offline mode and training falls back to slow Hugging Face/network timeouts, which defeats the latency regression fix this block is meant to provide for cached models.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6030709a47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T05:02:04Z

+    if _env_offline():
+        _tokenizer_class_cache[model_name] = False
+        return False


Read cached configs before defaulting offline

When HF_HUB_OFFLINE is set for a cached remote model whose required transformers tier is only discoverable from tokenizer_config.json/config.json (for example a forked Gemma4-style repo without the hard-coded name substring), this new early return skips the cached HF snapshot and forces _check_tokenizer_config_needs_v5 to False; the duplicated guard in _check_config_needs_550 does the same. activate_transformers_for_subprocess() then selects the default 4.57.x tier even though the needed metadata is already on disk, so offline training/inference of those cached models can fail with the wrong transformers version instead of avoiding only the network fetch.

Useful? React with 👍 / 👎.

#5505 fixed the GGUF/llama-server load path. Studio still has two adjacent code paths that burn ~30-60s of soft-failed timeouts before the worker subprocess starts when DNS to huggingface.co is dead and the model is already in the local HF cache. Inference parent process (routes/inference.py:load_model): * ModelConfig.from_identifier now runs inside _hf_offline_if_dns_dead so the LoRA-detect hf_model_info call and the urllib config probes in utils/transformers_version.py short-circuit when DNS is dead. * utils/models/model_config.py: extracted the inline HF_HUB_OFFLINE/ TRANSFORMERS_OFFLINE check used by list_gguf_variants and detect_gguf_model_remote into a shared _env_offline() helper, then reused it to gate the LoRA-detect hf_model_info call. * utils/transformers_version.py: _check_tokenizer_config_needs_v5 and _check_config_needs_550 now early-return False when offline instead of issuing a 10s urllib.urlopen against huggingface.co/raw/main. Training worker (core/training/worker.py:run_training_process): * Add the same 2s DNS probe used by core/inference/worker.py at the top of the training subprocess. On failure, set HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before the rest of the subprocess imports torch/transformers/unsloth, so every from_pretrained, snapshot_download, and load_dataset call below resolves from cache. Scope is per-subprocess; the orchestrator always spawns a fresh worker per training run. Training trainer (core/training/trainer.py:load_model): * Skip the proactive hf_model_info gated-repo probe when _env_offline() is true. The API is unreachable anyway, and a gated model that is already cached is exactly the scenario the user is trying to train against. from_pretrained surfaces the real error if access is actually denied. Tests (tests/test_offline_inference_parent.py, 7 new cases): * _env_offline truthy/falsy parsing across HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE. * transformers_version urllib short-circuit when offline. * LoRA detect hf_model_info skip when offline. Existing tests/test_offline_gguf_cache_fallback.py still passes (26 cases) because the inline env check was extracted, not changed.

The studio test stub convention only included the 6 httpx exception names that existed callers needed. Newer huggingface_hub (1.15+) imports HTTPError, Response, Request, HTTPStatusError, AsyncClient, and more at module import time. When httpx is truly absent the stub chase becomes a treadmill. Use the real package when installed (the CI install list already includes httpx, so this is the production environment). Fall back to the stub only when httpx is genuinely missing. No code under test changes.

Two follow-ups from the review pass on #5512: * ModelConfig.from_identifier no longer skips the remote LoRA-detect hf_model_info call when _env_offline() is true. huggingface_hub short-circuits the call via OfflineModeIsEnabled in ~0ms when HF_HUB_OFFLINE is set, so the original 25s concern was moot once routes/inference.py wrapped the call in _hf_offline_if_dns_dead. Skipping the API meant users with a cached LoRA adapter (adapter_config.json on disk) got is_lora=False and the load failed. After the API call (which raises fast offline) a new cache-fallback walks the HF cache snapshot for adapter_config.json via the existing _iter_hf_cache_snapshots helper. * test_hf_model_info_not_called_when_offline replaced. The old test raised AssertionError inside production code that catches Exception, so it passed even if the call happened. New tests use MagicMock and assert call_count >= 1, plus a fixture that stages a fake HF cache with adapter_config.json to verify the offline cache detection. Test count goes from 7 to 8 in test_offline_inference_parent.py. Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.

Same fix as #5505's _probe_dns_dead refactor: run gethostbyname on a daemon thread with join timeout so concurrent sockets in the parent interpreter never inherit a process-wide socket.setdefaulttimeout mutation. Adds a static-pin regression test that the inference parent file does not regress on this.

for more information, see https://pre-commit.ci

Shorten the longer explanatory comments added by this PR while keeping the WHY of each non-obvious branch: - trainer.py: collapse the 5-line proactive gated-check comment. - training/worker.py: trim the offline auto-detect preamble and the "logger isn't configured" note. - routes/inference.py: shorten the DNS-probe wrap rationale. - transformers_version.py: collapse the two urllib short-circuit notes. - model_config.py: shorten the LoRA detect + cache-fallback notes. - tests/test_offline_inference_parent.py: tighter module docstring, trim class docstrings, drop multi-line explainer comments inside the tests; behaviour and coverage unchanged (9/9 tests still pass).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72ff31a15d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T05:44:24Z


+    # Offline: skip the 10s urllib fetch (fail-open to lower tier).
+    if _env_offline():
+        _config_needs_550_cache[model_name] = False


Don't cache transient offline tier defaults

When _hf_offline_if_dns_dead() temporarily enables offline mode for a parent-process load and this check runs for a remote model without a local config.json, this stores False in the process-wide _config_needs_550_cache. Because the cache is consulted before any later local/network read, the same model remains classified as not needing transformers 5.5.0 even after DNS recovers, so the inference orchestrator/vision detection can keep spawning workers with the wrong transformers tier. The offline shortcut should avoid populating the normal cache, or key the cached value on the offline state.

Useful? React with 👍 / 👎.

danielhanchen requested a review from rolandtannous as a code owner May 17, 2026 11:08

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

danielhanchen mentioned this pull request May 17, 2026

__init__: include HF_DATASETS_OFFLINE in the offline env cross-sync unslothai/unsloth-zoo#664

Merged

3 tasks

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

danielhanchen force-pushed the offline-extend-to-parent-and-training branch from ec15082 to cf0fc65 Compare May 18, 2026 00:09

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

danielhanchen force-pushed the offline-extend-to-parent-and-training branch from 003761c to 47b9e8c Compare May 18, 2026 04:33

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

danielhanchen and others added 6 commits May 18, 2026 05:40

[pre-commit.ci] auto fixes from pre-commit.com hooks

e3f75f0

for more information, see https://pre-commit.ci

danielhanchen force-pushed the offline-extend-to-parent-and-training branch from 6030709 to 72ff31a Compare May 18, 2026 05:41

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

danielhanchen merged commit 3876c87 into main May 18, 2026
32 of 33 checks passed

danielhanchen deleted the offline-extend-to-parent-and-training branch May 18, 2026 07:31

Uh oh!

Conversation

danielhanchen commented May 17, 2026

Summary

Inference parent process (FastAPI side, before worker spawn)

Training subprocess (core/training/worker.py)

Behaviour

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!